blasting Sigenae 8


Type of output


-------
Plan take each fosmid…
assemble (Example Settings)


(Need to confirm all assemblies


consensus >20x coverage and 5000 bp and get non redundant dataset.

Single fasta (~45k) generated

Using Local cd-hit-est


Seemed to work




Combined_fosmid_cd-est.clstrCombined_fosmid_cd-est.bak.clstr



Also running blast on itself (the redundant file)




Multi BLAST (48409 sequences).txt





ReRunning cd-est hit with more stringency.


robertsmac:cd-hit-v4.5.4-2011-03-07 sr320$ ./cd-hit-est -i /Volumes/Bay4\ scratch/Combined_fasta_fosmids.fa -o /Volumes/Bay4\ scratch/Combined_fosmid_cd-est_V2 -c 0.95 -B 1
================================================================
Program: CD-HIT, V4.5.4, Feb 23 2012, 11:03:06
Command: ./cd-hit-est -i
         /Volumes/Bay4 scratch/Combined_fasta_fosmids.fa -o
         /Volumes/Bay4 scratch/Combined_fosmid_cd-est_V2 -c
         0.95 -B 1

Started: Thu Feb 23 13:34:05 2012
================================================================
                            Output                             
----------------------------------------------------------------
total seq: 48409
longest and shortest : 45743 and 5001
Total letters: 369548689
Sequences have been sorted

Approximated minimal memory consumption:
Sequence        : 5M
Buffer          : 1 X 22M = 22M
Table           : 1 X 17M = 17M
Miscellaneous   : 4M
Total           : 50M

Table limit with the given memory limit:
Max number of representatives: 4194304
Max number of word counting entries: 93673308


CHANGED NOTHING 


again
-----
robertsmac:cd-hit-v4.5.4-2011-03-07 sr320$ ./cd-hit-est -i /Volumes/Bay4\ scratch/Combined_fasta_fosmids.fa -o /Volumes/Bay4\ scratch/Combined_fosmid_cd-est_V3 -c 0.80 -B 1================================================================
Program: CD-HIT, V4.5.4, Feb 23 2012, 11:03:06
Command: ./cd-hit-est -i
         /Volumes/Bay4 scratch/Combined_fasta_fosmids.fa -o
         /Volumes/Bay4 scratch/Combined_fosmid_cd-est_V3 -c
         0.80 -B 1

Started: Thu Feb 23 14:32:03 2012
================================================================
                            Output                             
----------------------------------------------------------------
total seq: 48409
longest and shortest : 45743 and 5001
Total letters: 369548689
Sequences have been sorted

Approximated minimal memory consumption:
Sequence        : 5M
Buffer          : 1 X 22M = 22M
Table           : 1 X 17M = 17M
Miscellaneous   : 4M
Total           : 50M

Table limit with the given memory limit:
Max number of representatives: 4194304
Max number of word counting entries: 93673308

#fail


-
USER GUIDE
http://weizhong-lab.ucsd.edu/cd-hit/wiki/doku.php?id=cd-hit_user_guide





and again

robertsmac:cd-hit-v4.5.4-2011-03-07 sr320$ ./cd-hit-est -i /Volumes/Bay4\ scratch/Combined_fasta_fosmids.fa -o /Volumes/Bay4\ scratch/Combined_fosmid_cd-est_V5 -c 0.88 -n 7 -M 0

extremely slow
#fail


going to try original again
./cd-hit-est -i /Volumes/Bay4\ scratch/Combined_fasta_fosmids.fa -o /Volumes/Bay4\ scratch/Combined_fosmid_cd-est_V6

worked just fine.

Stayed with original CD-hit (default)